{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "简历信息提取是企业中经常遇到的场景，我们这一课仅用OpenAI来实现这一功能。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# 加载环境变量\n",
    "load_dotenv()\n",
    "# 从环境变量中读取api_key\n",
    "api_key = os.getenv('ZISHU_API_KEY')\n",
    "base_url = \"http://101.132.164.17:8000/v1\"\n",
    "chat_model = \"glm-4-flash\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "构造client\n",
    "构造client只需要两个东西：api_key和base_url。本教程用的是自塾提供的大模型API服务，在.env文件中已经有了api_key。这个只作为教学用。如果是在生产环境中，还是建议去使用例如智谱、零一万物、月之暗面、deepseek等大厂的大模型API服务。只要有api_key、base_url、chat_model三个东西即可。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from openai import OpenAI\n",
    "client = OpenAI(\n",
    "    api_key = api_key,\n",
    "    base_url = base_url\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有了这个client，我们就可以去实现各种能力了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_completion(prompt):\n",
    "    response = client.chat.completions.create(\n",
    "        model=chat_model,  # 填写需要调用的模型名称\n",
    "        messages=[\n",
    "            {\"role\": \"user\", \"content\": prompt},\n",
    "        ],\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "先试试这个大模型是否可用："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "我是一个人工智能助手，名为 ChatGLM，是基于清华大学 KEG 实验室和智谱 AI 公司于 2024 年共同训练的语言模型开发的。我的任务是针对用户的问题和要求提供适当的答复和支持。\n"
     ]
    }
   ],
   "source": [
    "response = get_completion(\"你是谁？\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "到这一步说明大模型可用。如果得不到这个回答，就说明大模型不可用，不要往下进行，要先去搞定一个可用的大模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们接下来实现一个简历信息提取智能体，只依赖openai库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime, date\n",
    "from typing import List, Optional\n",
    "from pydantic import BaseModel, Field, field_validator, EmailStr, model_validator\n",
    "\n",
    "# 定义这个pydantic模型是关键的关键\n",
    "class Resume(BaseModel):\n",
    "    name: Optional[str] = Field(None, description=\"求职者姓名，如果没找到就置为空字符串\")\n",
    "    city: Optional[str] = Field(None, description=\"求职者居住地，如果没找到就置为空字符串\")\n",
    "    birthday: Optional[str] = Field(None, description=\"求职者生日，如果没找到就置为空字符串\")\n",
    "    phone: Optional[str] = Field(None, description=\"求职者手机号，如果没找到就置为空字符串\")\n",
    "    email: Optional[str] = Field(None, description=\"求职者邮箱，如果没找到就置为空字符串\")\n",
    "    education: Optional[List[str]] = Field(None, description=\"求职者教育背景\")\n",
    "    experience: Optional[List[str]] = Field(None, description=\"求职者工作或实习经历，如果没找到就置为空字符串\")\n",
    "    project: Optional[List[str]] = Field(None, description=\"求职者项目经历，如果没找到就置为空字符串\")\n",
    "    certificates: Optional[List[str]] = Field(None, description=\"求职者资格证书，如果没找到就置为空字符串\")\n",
    "\n",
    "    @field_validator(\"birthday\", mode=\"before\")\n",
    "    def validate_and_convert_date(cls, raw_date):\n",
    "        if raw_date is None:\n",
    "            return None\n",
    "        if isinstance(raw_date, str):\n",
    "            # List of acceptable date formats\n",
    "            date_formats = ['%d-%m-%Y', '%Y-%m-%d', '%d/%m/%Y', '%m-%d-%Y']\n",
    "            for fmt in date_formats:\n",
    "                try:\n",
    "                    # Attempt to parse the date string with the current format\n",
    "                    parsed_date = datetime.strptime(raw_date, fmt).date()\n",
    "                    # Return the date in MM-DD-YYYY format as a string\n",
    "                    return parsed_date.strftime('%m-%d-%Y')\n",
    "                except ValueError:\n",
    "                    continue  # Try the next format\n",
    "            # If none of the formats match, raise an error\n",
    "            raise ValueError(\n",
    "                f\"Invalid date format for 'consultation_date'. Expected one of: {', '.join(date_formats)}.\"\n",
    "            )\n",
    "        if isinstance(raw_date, date):\n",
    "            # Convert date object to MM-DD-YYYY format\n",
    "            return raw_date.strftime('%m-%d-%Y')\n",
    "\n",
    "        raise ValueError(\n",
    "            \"Invalid type for 'consultation_date'. Must be a string or a date object.\"\n",
    "        )\n",
    "\n",
    "class ResumeOpenAI:\n",
    "    def __init__(self):\n",
    "        self.resume_profile = Resume()\n",
    "        self.output_schema = self.resume_profile.model_json_schema()\n",
    "        self.template = \"\"\"\n",
    "        You are an expert in analyzing resumes. Use the following JSON schema to extract relevant information:\n",
    "        ```json\n",
    "        {output_schema}\n",
    "        ```json\n",
    "        Extract the information from the following document and provide a structured JSON response strictly adhering to the schema above. \n",
    "        Please remove any ```json ``` characters from the output. Do not make up any information. If a field cannot be extracted, mark it as `n/a`.\n",
    "        Document:\n",
    "        ----------------\n",
    "        {resume_content}\n",
    "        ----------------\n",
    "        \"\"\"\n",
    "\n",
    "    def create_prompt(self, output_schema, resume_content):\n",
    "        return self.template.format(\n",
    "            output_schema=output_schema,\n",
    "            resume_content=resume_content\n",
    "        )\n",
    "\n",
    "    def run(self, resume_content):\n",
    "        try:\n",
    "            response = client.chat.completions.create(\n",
    "                model=chat_model,\n",
    "                # 不是所有模型都支持response_format，要看一下调用的模型是否支持这个参数\n",
    "                # 千问、智谱的模型一般支持\n",
    "                response_format={ \"type\": \"json_object\" },\n",
    "                messages=[\n",
    "                    {\"role\": \"system\", \"content\": \"你是一位专业的简历信息提取专家。\"},\n",
    "                    {\"role\": \"user\", \"content\": self.create_prompt(self.output_schema, resume_content)}\n",
    "                ],\n",
    "            )\n",
    "\n",
    "            result = response.choices[0].message.content\n",
    "        except Exception as e:\n",
    "            print(f\"Error occurred: {e}\")\n",
    "\n",
    "        return result\n",
    "\n",
    "resume_openai = ResumeOpenAI()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以用一个示例来测试一下：\n",
    "\n",
    "已知有一个简历文本，这个文本可以是经过简历文件转成图片再OCR识别后的一堆非结构化文本。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 示例输入数据\n",
    "input_data = \"\"\"\n",
    "59a488639e2f882c1nR_2NS6EFJXwZG-UPKaR-WhnPQ~\n",
    "杜素宁\n",
    "MOBILE : 15904130130\n",
    "E-MAIL：0da08x@163.com\n",
    "Address:云南省昭通市\n",
    "个人信息\n",
    "民族：汉 籍贯：云南省昭通市 性别：女 年龄: 22\n",
    "教育经历\n",
    "2008.08-2012.08 北方工业大学 食品科学与工程 学士学位\n",
    "主要经历\n",
    "Project Experience\n",
    "工作经历：\n",
    "1997.06-2010.07 江苏华英企业管理股份有限公司 水处理工程师\n",
    "工作内容:\n",
    "1.负责部门内日常用品的采购；2.做好与公司内其他部门的对接工作；3.协助部门进行办公环境管理和后勤管理工作；4.销\n",
    "售人员与公司的信息交流，随时保持与市场销售人员的电话沟通，销售政策及公司文件的及时传达。5.领导交办的其他工作\n",
    "工作经历：\n",
    "1991年12月-2012年 和宇健康科技股份有限公司 市场营销专员\n",
    "09月\n",
    "工作内容:\n",
    "1、做好消费宾客的迎、送接待工作，接受宾客各种渠道的预定并加以落实；2、礼貌用语，详细做好预订记录；3、了解和\n",
    "收集宾客的建议和意见并及时反馈给上级领导；4、以规范的服务礼节，树立公司品牌优质，文雅的服务形象。\n",
    "工作经历：\n",
    "2007/05-2010/03 深圳市有棵树科技有限公司 拼多多运营\n",
    "工作内容:\n",
    "1.负责规定区域的产品销售，做好产品介绍，确认订单，回款等销售相关工作；2.做好客户背景资料调查，竞争对手分析，\n",
    "产品适用性分析；3.按公司规定完成SalesPipeline信息记录\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们现在来运行一下这个示例简历"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"properties\": {\n",
      "    \"name\": \"杜素宁\",\n",
      "    \"city\": \"云南省昭通市\",\n",
      "    \"birthday\": \"n/a\",\n",
      "    \"phone\": \"15904130130\",\n",
      "    \"email\": \"0da08x@163.com\",\n",
      "    \"education\": [\n",
      "      \"2008.08-2012.08 北方工业大学 食品科学与工程 学士学位\"\n",
      "    ],\n",
      "    \"experience\": [\n",
      "      \"1997.06-2010.07 江苏华英企业管理股份有限公司 水处理工程师\",\n",
      "      \"1991年12月-2012年 和宇健康科技股份有限公司 市场营销专员\",\n",
      "      \"2007/05-2010/03 深圳市有棵树科技有限公司 拼多多运营\"\n",
      "    ],\n",
      "    \"project\": [\n",
      "      \"Project Experience\"\n",
      "    ],\n",
      "    \"certificates\": \"n/a\"\n",
      "  },\n",
      "  \"title\": \"Resume\",\n",
      "  \"type\": \"object\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 运行智能体\n",
    "rec_data = resume_openai.run(input_data)\n",
    "print(rec_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "下面是大模型给出的简历提取结果：\n",
    "\n",
    "{\n",
    "  \"name\": \"杜素宁\",\n",
    "  \"city\": \"云南省昭通市\",\n",
    "  \"birthday\": \"22岁\",\n",
    "  \"phone\": \"15904130130\",\n",
    "  \"email\": \"0da08x@163.com\",\n",
    "  \"education\": [\n",
    "    \"2008.08-2012.08 北方工业大学 食品科学与工程 学士学位\"\n",
    "  ],\n",
    "  \"experience\": [\n",
    "    \"1997.06-2010.07 江苏华英企业管理股份有限公司 水处理工程师\",\n",
    "    \"工作内容: 负责部门内日常用品的采购；做好与公司内其他部门的对接工作；协助部门进行办公环境管理和后勤管理工作；销售人员与公司的信息交流，随时保持与市场销售人员的电话沟通，销售政策及公司文件的及时传达。领导交办的其他工作\",\n",
    "    \"1991年12月-2007.04 和宇健康科技股份有限公司 市场营销专员\",\n",
    "    \"工作内容: 做好消费宾客的迎、送接待工作，接受宾客各种渠道的预定并加以落实；礼貌用语，详细做好预订记录；了解和收集宾客的建议和意见并及时反馈给上级领导；以规范的服务礼节，树立公司品牌优质，文雅的服务形象。\",\n",
    "    \"2007/05-2010/03 深圳市有棵树科技有限公司 拼多多运营\",\n",
    "    \"工作内容: 负责规定区域的产品销售，做好产品介绍，确认订单，回款等销售相关工作；做好客户背景资料调查，竞争对手分析，产品适用性分析；按公司规定完成SalesPipeline信息记录\"\n",
    "  ],\n",
    "  \"project\": \"n/a\",\n",
    "  \"certificates\": \"n/a\"\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dbgpt_env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
